Deterministic boundary recognition and topology extraction 

for large sensor networks 



Alexander Kroller*''' Sandor P. Fekete* Dennis Pfisterer" Stefan Fischer* 



Abstract 

We present a new framework for the crucial challenge of 
self-organization of a large sensor network. The basic 
scenario can be described as follows: Given a large swarm 
of immobile sensor nodes that have been scattered in a 
polygonal region, such as a street network. Nodes have 
no knowledge of size or shape of the environment or the 
position of other nodes. Moreover, they have no way of 
measuring coordinates, geometric distances to other nodes, 
or their direction. Their only way of interacting with other 
nodes is to send or to receive messages from any node that 
is within communication range. The objective is to develop 
algorithms and protocols that allow self-organization of the 
swarm into large-scale structures that reflect the structure 
of the street network, setting the stage for global routing, 
tracking and guiding algorithms. 

Our algorithms work in two stages: boundary recog- 
nition and topology extraction. All steps are strictly de- 
terministic, yield fast distributed algorithms, and make no 
assumption on the distribution of nodes in the environment, 
other than sufficient density. 

1 Introduction 

In recent time, the study of wireless sensor networks 
(WSN) has become a rapidly developing research area 
that offers fascinating perspectives for combining tech- 
nical progress with new applications of distributed com- 
puting. Typical scenarios involve a large swarm of 
small and inexpensive processor nodes, each with lim- 
ited computing and communication resources, that are 
distributed in some geometric region; communication is 
performed by wireless radio with limited range. As en- 
ergy consumption is a limiting factor for the lifetime of a 
node, communication has to be minimized. Upon start- 
up, the swarm forms a decentralized and self-organizing 
network that surveys the region. 

From an algorithmic point of view, the charac- 
teristics of a sensor network require working under a 
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paradigm that is different from classical models of com- 
putation: absence of a central control unit, limited 
capabilities of nodes, and limited communication be- 
tween nodes require developing new algorithmic ideas 
that combine methods of distributed computing and 
network protocols with traditional centralized network 
algorithms. In other words: How can we use a limited 
amount of strictly local information in order to achieve 
distributed knowledge of global network properties? 

This task is much simpler if the exact location of 
each node is known. Computing node coordinates has 
received a considerable amount of attention. Unfortu- 
nately, computing exact coordinates requires the use 
of special location hardware like GPS, or alternatively, 
scanning devices, imposing physical demands on size 
and structure of sensor nodes. As we demonstrated in 
our paper ^U], current methods for computing coordi- 
nates based on anchor points and distance estimates en- 
counter serious difficulties in the presence of even small 
inaccuracies, which are unavoidable in practice. 

When trying to extract robust cluster structures 
from a huge swarm of nodes scattered in a street network 
of limited size, trying to obtain individual coordinates 
for all nodes is not only extremely difficult, but may 
indeed turn out to be a red-herring chase. As shown 
m [H], there is a way to sidestep many of the above 
difficulties, as some structural location aspects do not 
depend on coordinates. This is particularly relevant for 
sensor networks that are deployed in an environment 
with interesting geometric features. (See [5] for a more 
detailed discussion.) Obviously, scenarios as the one 
shown in Figure ^ pose a number of interesting geo- 
metric questions. Conversely, exploiting the basic fact 
that the communication graph of a sensor network has 
a number of geometric properties provides an elegant 
way to extract structural information. 

One key aspect of location awareness is boundary 
recognition, making sensors close to the boundary of 
the surveyed region aware of their position. This is of 
major importance for keeping track of events entering 
or leaving the region, as well as for communication with 
the outside. More generally, any unoccupied part of the 
region can be considered a hole, not necessary because 




(a) 60,000 sensor nodes, (b) A zoom into (a) 
uniformly distributed in a shows the communication 
polygonal region. graph. 




(c) A further zoom shows the communication ranges. 

Figure 1: Scenario of a geometric sensor network, ob- 
tained by scattering sensor nodes in the street network 
surrounding Braunschweig University of Technology. 

of voids in the geometric region, but also because of 
insufficient coverage, fluctuations in density, or node 
failure due to catastrophic events. Neglecting the 
existence of holes in the region may also cause problems 
in communication, as routing along shortest paths tends 
to put an increased load on nodes along boundaries, 
exhausting their energy supply prematurely; thus, a 
moderately-sized hole (caused by obstacles, by an event, 
or by a cluster of failed nodes) may tend to grow larger 
and larger. (See jjj.) Therefore, it should be stressed 
that even though in our basic street scenario holes in 
the sensor network are due to holes in the filled region, 
our approach works in other settings as well. 

Once the boundary of the swarm is obtained, it can 
be used as a stepping stone for extracting further struc- 
tures. This is particularly appealing in our scenario, in 
which the polygonal region is a street network: In that 
scenario, we have a combination of interesting geomet- 
ric features, a natural underlying structure of moderate 
size, as well as a large supply of practical and relevant 
benchmarks that are not just some random polygons, 
but readily available from real life. More specifically, 
we aim at identifying the graph in which intersections 
are represented by vertices, and connecting streets are 
represented by edges. This resulting cluster structure 
is perfectly suited for obtaining useful information for 
purposes like routing, tracking or guiding. Unlike an 
arbitrary tree structure that relies on the performance 
of individual nodes, it is robust. 



Related Work: [5] is the first paper to introduce 
a communication model based on quasi-unit disk graphs 
(QUDGs). A number of articles deal with node co- 
ordinates; most of the mathematical results are nega- 
tive, even in a centralized model of computation. [3] 
shows that unit disk graph (UDG) recognition is NP- 
hard, while shows NP-hardness for the more re- 
stricted setting in which all edge lengths are known. 
[T2"] shows that QUDG recognition, i.e., UDG approx- 
imation, is also hard; finally, 0] show that UDG em- 
bedding is hard, even when all angles between edges are 
known. The first paper (and to the best of our knowl- 
edge, the only one so far) describing an approximative 
UDG embedding is : however, the approach is cen- 
tralized and probabilistic, yielding (with high probabil- 
ity) a C(log 2 ' 5 n^/log log n)-approximation. 

There are various papers dealing with heuristic 
localization algorithms; e.g., see [HI E3 E3 ED E] • In 
this context, see our paper for an experimental 
study pointing out the serious deficiencies of some of 
the resulting coordinates. 

Main Results: Our main result is the construc- 
tion of an overall framework that allows a sensor node 
swarm to self-organize into a well-structured network 
suited for performing tasks such as routing, tracking 
or other challenges that result from popular visions of 
what sensor networks will be able to do. The value of 
the overall framework is based on the following aspects: 

• We give a distributed, deterministic approach for 
identifying nodes that are in the interior of the 
polygonal region, or near its boundary. Our al- 
gorithm is based on topological considerations and 
geometric packing arguments. 

• Using the boundary structure, we describe a dis- 
tributed, deterministic approach for extracting the 
street graph from the swarm. This module also 
uses a combination of topology and geometry. 

• The resulting framework has been implemented 
and tested in our simulation environment Shawn; 
we display some experimental results at the end of 
the paper. 

The rest of this paper is organized as follows. In 
the following Section 2 we describe underlying models 
and introduce necessary notation. Section 3 deals with 
boundary recognition. This forms the basis for topo- 
logical clustering, described in Section 4. Section 5 de- 
scribes some computational experiments with a realistic 
network. 



2 Models and Notation 

Sensor network: A Sensor Network is modeled 
by a graph G = (V,E), with an edge between any two 
nodes that can communicate with each other. For a 
node v S V, we define Nk(v) to be the set of all nodes 
that can be reached from v within at most k edges. The 
set N(v) — Ni(v) contains the direct neighbors of v, 
i.e., all nodes w € V with vw G E. For convenience we 
assume that v S N(v) Vv G V. For a set U C V, we 
define Nk(U) :— U ue ijNk(u). The size of the largest k- 
hop neighborhood is denoted by A& := max„ g y |iVfc(w)|. 
Notice that for geometric radio networks with even 
distribution, Aj, = 0(fc 2 Ai) is a reasonable assumption. 

Each node has a unique ID of size 0(log \ V\). The 
identifier of a node v is simply v. 

Every node has is equipped with local memory of 
size 0(Aq, 1 -. log |V|). Therefore, each node can store 
a subgraph consisting of nodes that are at most 0(1) 
hops away, but not the complete network. 

Computation: Storage limitation is one of the 
main reasons why sensor networks require different 
algorithms: Because no node can store the whole 
network, simple algorithms that collect the full problem 
data at some node to perform centralized computations 
are infeasible. 

Due to the distributed nature of algorithms, the 
classic means to describe runtime complexity are not 
sufficient. Instead, we use separate message and time 
complexities: The former describes the total number 
of messages that are sent during algorithm execution. 
The time complexity describes the total runtime of the 
algorithm over the whole network. 

Both complexities depend heavily on the compu- 
tational model. For our theoretical analysis, we use a 
variant of the well-established CONGEST model [H| : 
All nodes start their local algorithms at the same time 
[simultaneous makeup). The nodes are synchronized, 
i.e., time runs in rounds that are the same for all nodes. 
In a single round, a node can perform any computation 
for which it has complete data. All messages arrive at 
the destination node at the beginning of the subsequent 
round, even if they have the same source or destination. 
There are no congestion or message loss effects. The size 
of a message is limited to 0(log|V|) bits. Notice that 
this does only affect the message complexity, as there is 
no congestion. We will use messages of larger sizes in 
our algorithms, knowing that they can be broken down 
into smaller fragments of feasible size. 

Geometry: All sensor nodes are located in the 
two-dimensional plane, according to some mapping p : 
V — > M 2 . It is a common assumption that the ability 
to communicate depends on the geometric arrangement 
of the nodes. There exists a large number of different 



models that formalize this assumption. Here we use the 
following reasonable model: 

We say p is a d- Quasi Unit Disk Embedding of G for 
parameter d < 1, if both 

uv G E ==>■ \\p{u) — p(f) ||2 < 1 
uv G E <= \\p{u) — p(v)\\2 < d 

hold. G itself is called a d-Quasi Unit Disk Graph (d- 
QUDG) if an embedding exists. A 1-QUDG is called 
a Unit Disk Graph (UDG). Throughout this paper we 
assume that G is a <i-QUDG for some d > i-s/2. The 
reason for this particular bound lies in Lemma 13.11 
which is crucial for the feasibility of our boundary 
recognition algorithm. The network nodes know the 
value of d, and the fact that G is a <i-QUDG. The 
embedding p itself is not available to them. 

An important property of our algorithms is that 
they do not require a specific distribution of the nodes. 
We only assume the existence of the embedding p. 

3 Boundary Recognition 

This section introduces algorithms that detect the 
boundary of the region that is covered by the sensor 
nodes. First, we present some properties of QUDGs. 
These allow deriving geometric knowledge from the net- 
work graph without knowing the embedding p. Then 
we define the Boundary Detection Problem, in which 
solutions are geometric descriptions of the network ar- 
rangement. Finally, we describe a start procedure and 
an augmentation procedure. Together, they form a local 
improvement algorithm for boundary detection. 

3.1 QUDG Properties. We start this section with 
a simple property of QUDGs. The special case where 
d = 1 was originally proven by Breu and Kirkpatrick 
PJ. Recall that we assume d > v / 2/2. 

Lemma 3.1. Let u, v, w, x be four different nodes in V , 
where uv G E and wx G E. Assume the straight-line 
embeddings of uv and wx intersect. Then at least one 
of the edges in F := {uw, ux, vw, vx} is also in E. 

Proof. We assume p(u) 7^ p{v); otherwise the lemma 
is trivial. Let a := \\p(u) — p(v)\\2 < L Consider two 
circles of common radius d with their centers at p(u), 
resp. p(v). The distance between the two intersection 

points of these circles is h := 2\Jd 2 — \a? > 1. If F 
and E were distinct, p(w) and p(x) had both to be 
outside the two circles. Because of the intersecting edge 
embeddings, — || 2 > h > 1, which would 

contradict wx G E. 

Lemma l3~T1 allows to use edges in the graph to sep- 
arate nodes in the embedding p, even without knowing 



p. We can use this fact to certify that a node is inside 
the geometric structure defined by some other nodes. 
Let C C V be a chordless cycle in G, i.e., (C,E(C)) is 
a connected 2-regular subgraph of G. P(C) denotes the 
polygon with a vertex at each p(v),v £ C and an edge 
between vertices whose corresponding nodes are adja- 
cent in G. P(C) also defines a decomposition of the 
plane into faces. A point in the infinite face is said to 
be outside of P(C), all other points are inside. 

Corollary 3.1. Let C be a chordless cycle in G, and 
let U C V be connected. Also assume N(C) PI U = 0. 
Then either the nodes in U are all on the outside of 
P(C), or all on the inside. 

This follows directly from Lemma 13.11 So we can 
use chordless cycles for defining cuts that separate inside 
from outside nodes. Our objective is to certify that 
a given node set is inside the cycle, thereby providing 
insight into the network's geometry. Unfortunately, this 
is not trivial; however, it is possible to guarantee that a 
node set is outside the cycle. 

Note that simply using two node sets that arc 
separated by a chordless cycle C and proving that the 
first set is outside the cycle does not guarantee that 
the second set is on the inside. The two sets could be 
on different sides of P(C). So we need more complex 
arguments to certify insideness. 

Now we present a certificate for being on the 
outside. Define fitd(n) to be the maximum number 
of independent nodes J that can be placed inside a 
chordless cycle C of at most n nodes in any g?-QUDG 
embedding such that J n N(C) = 0. We say that 
nodes are independent, if there is no edge between any 
two of them. These numbers exist because independent 
nodes are placed at least d from each other, so there 
is a certain area needed to contain the nodes. On the 
other hand, C defines a polygon of perimeter at most 
|G|, which cannot enclose arbitrarily large areas. Also 
define enCd(m) := min{n : fit<j(n) > m}, the minimum 
length needed to fit m nodes. 

The first 20 values of fiti and fiti_ E for some small 
e are shown in Table ^ They can be obtained by 
considering hexagonal circle packings. Because these 
are constants it is reasonable to assume that the first 
few values of fit^ are available to every node. 

We are not aware of the exact values of &td for all 
d. However, our algorithms just need upper bounds for 
fitd, and lower bounds for enc^. (An implementation of 
the following algorithms has to be slightly adjusted to 
use bounds instead of exact values.) 

Now we can give a simple criterion to decide that a 
node set is outside a chordless cycle: 



Lemma 3.2. Let C be a chordless cycle and I C V \ 
N(C) be a connected set that contains an independent 
subset J C I. If \ J\ > fit(j(|C|), then every node in I is 
outside P{C). 

Proof. By Corollary 13. II and the definition of fitrf. 

3.2 Problem statement. In this section, we define 
the Boundary Detection Problem. Essentially, we are 
looking for node sets and chordless cycles, where the 
former are guaranteed to be on the inside of the latter. 
For the node sets to become large, the cycles have to 
follow the perimeter of the network region. In addition, 
we do not want holes in the network region on the inside 
of the cycles, to ensure that each boundary is actually 
reflected by some cycle. 

We now give formal definitions for these concepts. 
We begin with the definition of a hole: The graph 
G and its straight-line embedding w.r.t. p defines a 
decomposition of the plane into faces. A finite face F 
of this decomposition is called h-hole with parameter h 
if the boundary length of the convex hull of F strictly 
exceeds h. An important property of an h-hole F is the 
following fact: Let C be a chordless cycle with |G| < h. 
Then all points f £ F are on the outside of P{C). 

To describe a region in the plane, we use chord- 
less cycles in the graph that follow the perimeter of the 
region. There is always one cycle for the outer perime- 
ter. If the region has holes, there is an additional cycle 
for each of them. We formalize this in the opposite di- 
rection: Given the cycles, we define the region that is 
enclosed by them. So let C :— (Cb)b£t3 be a family of 
chordless cycles in the network. It describes the bound- 
ary of the region A(C) C K 2 , which is defined as follows. 
First let A be the set of all points iel 2 for which the 
cardinality of {b £ B : x is on the inside of P(Cb)} is 
odd. This set gives the inner points of the region, which 
are all points that are surrounded by an odd number of 
boundaries. The resulting region is defined by 

(3.1) A(C):=\JP(C b )UA. 

See Figure |21 for an example with some cycles and the 
corresponding region. 

We can use this approach to introduce geometry 
descriptions. These consist of some boundary cycles 
(Cb)b g e, and nodes sets (Jj)jgi that are known to 
reside within the described region. The sets are used 
instead of direct representations of A(C), because we 
seek descriptions that are completely independent of the 
actual embedding of the network. There is a constant 
K that limits the size of holes. We need K to be large 
enough to find cycles in the graph that cannot contain 
X-holes. Values K ss 15 fulfill these needs. 
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Table 1: First values of fitd(n) 
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Figure 2: Area described by four boundary cycles. 



Definition 3.1. A feasible geometry description 

(FGD) is a pair (C, (Ij)jgx) with C — (Cb)beB of node 

set families that fulfills the following conditions: 

(Fl) Each Cb is a chordless cycle in G that does not 

contain any node from the family (Ji)jgi. 

(F2) There is no edge between different cycles. 

(F3) For each v G h (i G T), p(v) G A(C). 

(F4) For every component A' of A(C), there is an 

index i G I, such that p(v) G A' Vw G Ii and p{v) ^ A' 

Vv G ^ i. 

(F5) A{C) does not contain an inner point of any 
k-hole for k > K. 

Note that condition (F4) correlates some cycles with a 
component of A(C), which in turn can be identified by 
an index % G T. This index is denoted by IC(w), where 
v G V is part of such a cycle or the corresponding Ii. 

See Figure [3] in the computational experience sec- 
tion for an example network. Figures |H1 and \7\ show 
different FGDs in this network. 

We are looking for a FGD that has as many inside 
nodes as possible, because that forces the boundary 
cycles to follow the network boundary as closely as 
possible. The optimization problem we consider for 
boundary recognition is therefore 



(3.2) (BD) 



max |Uj e x7j| 

s.t. {{C b ) beB , (Ii)iex) is a FGD 



3.3 Algorithm. We solve (BD) with local improve- 
ment methods that switch from one FGD to another of 
larger | Ui^jli\. In addition to the FGD, our algorithms 
maintain the following sets: 

• The set C :— UbggCb of cycle nodes. 

• N(C), the cycle neighbors. Notice C C N(C). 



• I := Ujgz-^i) the inner nodes. Our algorithms 
ensure IDN(C) = (this is no FGD requirement), 
and all Ii will be connected sets. This is needed in 
several places for Lemma 13.21 to be applicable. 

• J C I, consisting of so-called independent inners. 
These nodes form an independent set in G. 

• the set U := V \ (N(C) U I) of unexplored nodes. 

Initially, U — V, all other sets are empty. 

We need to know how many independent nodes are 
in a given Ii as proof that a cycle cannot accidently 
surround Ii . Because all considered cycles consist of at 
most K nodes, every count exceeding fttd(K) has the 
same implications. So we measure the mass of an Ii by 

(3.3) M(i) :=rmn{|Jn/i|,fit rf (iT) + l} . 

Because we are interested in distributed algorithms, 
we have to consider what information is available at the 
individual nodes. Our methods ensure (and require) 
that each node knows to which of the above sets it 
belongs. In addition, each cycle node v G C knows 
lC(v), M(IC(u)), and N(v)nC, and each cycle neighbor 
w G N(C) knows N(w) n C. 

The two procedures are described in the following 
two sections: First is an algorithm that produces 
start solutions, second an augmentation method that 
increases the number of inside nodes. 

3.4 Flowers. So far, we have presented criteria by 
which one can decide that some nodes are outside a 
chordless cycle, based on a packing argument. Such a 
criterion will not work for the inside, as any set of nodes 
that fit in the inside can also be accomodated by the 
unbounded outside. Instead, we now present a stronger 
strcutural criterion that is based on a particular sub- 
graph, an 771-fiower. For such a structure, we can prove 
that there are some nodes on the inside of a chordless 
cycle. Our methods start by searching for flowers, lead- 
ing to a FGD. We begin by actually defining a flower, 
see Figure 13 for a visualization. 

Definition 3.2. An m- flower in G is an induced sub- 
graph whose node set consists of a seed fo G V , indepen- 
dent nodes f lfl , . . . , f 1>m G V, bridges / 2 ,i, . . . , f 2 , m G 
V, hooks /3,i, . . • , /3, m G V, and chordless paths 
Wi,...,W m , where each Wi — (wj.i, ■ ■ ■ , Wj,i j ) C V. 




X 








X 


y 







(1 + V2)w 



(I 

w 



(1 + V2/2)w 
-(1 + V2)w 



(same X coordinates as Y) 



Figure 3: A 5-flower. Figure 4: Construction of a 4- 
fiower in a dense region. 



All of these 1 + 3to + Y]j—i £j nodes have to be differ- 
ent nodes. For convenience, we define fj Q := fj m and 
f 3 ,m+i ■= f 3 ,i for j = 1, 2, 3. 

The edges of the subgraph are the following: The 
seed fo is adjacent to all independent nodes: /o/ij € 
E for j = l,...,m. Each independent node fij is 
connected to two bridges: fx.jfi^ G E and /ij/2,.7+1 S 
E. The bridges connect to the hooks: f2,jfs.j G E for 
j = l,...,m. Each path Wj connects two hooks, that 
is, /3ji"j,i,w 3 ,iWj,2, • • ■ , wj^hj+i are edges in E. 

Finally, the path lengths lj, j = 1, . . . , m obey 

(3.4) fit d (5 + ^) < to- 2, 



(3.5) fit d (7 + ^-) < 



Notice that Equations (|3.4(1 and (|3.5|l can be ful- 
filled: for d = 1, m = 5 and £1 = ^2 = ■ • • = £5 = 3 are 
feasible. This is the flower shown in Figure 03 

The beauty of flowers lies in the following fact: 

Lemma 3.3. In every d-QUDG embedding of a m- 
fiower, the independent nodes are placed on the inside 
of P{C), where C := {/ 3 ,i, . . . , / 3 , m } U \JT=i Wj is a 
chordless cycle. 

Proof. Let P 3 := (fi,j,f2,jj3,j,WjJ a , j+1 ,f 2 ,j+i) be a 
petal of the flower. Pj defines a cycle of length 5 + £j. 
The other nodes of the flower are connected and contain 
m — 2 independent bridges. According to l|3.4(l . this 
structure is on the outside of P(Pj). 

Therefore, the petals form a ring of connected 
cycles, with the seed on either the inside or the outside 
of the structure. Assume the seed is on the outside. 
Consider the infinite face of the straight-line embedding 
of the flower. The seed is part of the outer cycle, which 
consists of 7 + tj nodes for some j € {1, . . . , m}. This 
cycle has to contain the remaining flower nodes, which 
contradicts l|3.5l) . Therefore, the seed is on the inside, 
and the claim follows. 



Because we do not assume a particular distribution 
of the nodes, we cannot be sure that there is a flower 
in the network. Intuitively, this is quite clear, as any 
node may be close to the boundary, so that there are 
no interior nodes; as the nodes can only make use of 
the local graph structure, and have no direct way of 
detecting region boundaries, this means that for low 
densities everywhere, our criterion may fail. As we show 
in the following, we can show the existence of a flower if 
there is a densely populated region somewhere: We say 
G is locally e-dense in a region A C K 2 , if every e-ball 
in A contains at least one node, i.e., Vz € K. 2 : B e (z) C 
A =>• 3v S V : \\p(v) - z\\ 2 < e. 



Lemma 3.4. Let < e < | - y/2 



0.086. Assume 

d = 1. If G is e-dense on the disk Bz(z) for some 
z £ R 2 , then G contains a A- flower. 



Proof. Let w :— 2(y2 — f). See Figure 0] Place an 
e-ball at all the indicated places and choose a node in 
each. Then the induced subgraph will contain precisely 
the drawn edges. Then to = 4 and £1 — ... = £4 = 3, 
so for d = 1, these ^-numbers are feasible. 

Now we present the actual algorithm to detect 
flowers. Notice that a flower is a strictly local structure, 
so we use a very simple kind of algorithm. Each 
node v S V performs the following phases after the 
simultaneous wakeup: 

1. Collect the subgraph on N$(v). 

2. Find a flower. 

3. Announce update. 

4. Update. 

Collect: First, each node v € V collects and stores 
the local neighborhood graph N$(v). This can be done 
in time O(Ai) and message complexity O(AiAg), if 
every nodes broadcasts its direct neighborhood to its 
8-neighborhood. 

Find Flower: Then, every node decides for itself 
whether it is the seed of a flower. This does not involve 
any communication. 

Announce update: Because there could be multi- 
ple intersecting flowers, the final manifestation of flowers 
has to be scheduled: Every seed of some flower broad- 
casts an announcement to all nodes of the flower and 
their neighbors. Nodes that receive multiple announce- 
ments decide which seed has higher priority, e.g., higher 
ID number. The seeds are then informed whether they 
lost such a tie-break. This procedure has runtime 0(1) 
and message complexity 0(Ag) per seed, giving a total 
message complexity of C(A 9 |y|). 

Update: The winning seeds now inform their flow- 
ers that the announced updates can take place. This is 
done in the same manner as the announcements. The 



nodes that are part of a flower store their new status 
and the additional information described in Section [3.31 

3.5 Augmenting Cycles. Now that we have an 
algorithm to construct an initial FGD in the network, 
we seek an improvement method. For that, we employ 
augmenting cycles. Consider an FGD ((Cb)beBi {h)iei)- 
Let U = (ux, «2, . . . jKim) C V be a (not necessarily 
chordless) cycle. For convenience, define u := uij/i and 
u \u\+i : = ui- 

When augmenting, we open the cycles {Cb)beB 
where they follow U, and reconnect the ends according 
to U. Let U~ := {ut G U : Ui—i, Ui, Ui+i G C} 
and U + := U \ C. The resulting cycle nodes of the 
augmentation operation are then C := C U U + \ U~ . 
If N(U) n / = 0, this will not affect inside nodes, and 
it may open some new space for the inside nodes to 
discover. In addition, as the new cycle cannot contain a 
|£/|-hole, we can limit |J7| to guarantee condition (F5). 

We use a method that will search for an augmenting 
cycle that will lead to another FGD with a larger num- 
ber of inside nodes, thereby performing one improve- 
ment step. The method is described for a single node 
V\ G C that searches for an augmenting cycle containing 
itself. This node is called initiator of the search. 

It runs in the following phases: 

1. Cycle search. 

2. Check solution. 

(a) Backtrack. 

(b) Query feasibility. 

3. Announce update. 

4. Update. 

Cycle search: Vi initiates the search by passing 
around a token. It begins with the token T — (vi). 
Each node that receives this token adds itself to the 
end of it and forwards it to a neighbor. When the token 
returns from there, the node forwards it to the next 
feasible neighbor. If there are no more neighbors, the 
node removes itself from the list end and returns the 
token to its predecessor. 

The feasible neighbors to which T gets forwarded 
are all nodes in V \ I. The only node that may 
appear twice in the token is v%, which starts the 
"check solution" phase upon reception of the token. In 
addition, T must not contain a cycle node between two 
cycle neighbors. The token is limited to contain 
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nodes. This phase can be implemented such that no 
node (except for v±) has to store any information about 
the search. When this phase terminates unsuccessfully, 



i.e., without an identified augmenting cycle, the initiator 
exits the algorithm. 

Check Solution: When the token gets forwarded 
to i>i, it describes a cycle. v\ then sends a backtrack 
message backwards along T: 

Backtrack: While the token travels backwards, 
each node performs the following: If it is a cycle node, 
it broadcasts a query containing T to its neighbors, 
which in turn respond whether they would become 
inside nodes after the update. Such nodes are called 
new inners. Then, the cycle node stores the number of 
positive responses in the token. 

A non-cycle node checks whether it would have any 
chords after the update. In that case, it cancels the 
backtrack phase and informs v\ to continue the cycle 
search phase. 

Query Feasibility: When the backtrack message 
reaches V\, feasibility is partially checked by previous 
steps. Now, v\ checks the remaining conditions. 

Let T := {IC(v) : v G C n T}. First, it confirms 
that for every i el there is a matching cycle node in the 
token that has a nonzero new inner count. Then it picks 
a i' G X. All new inners of cycle nodes of this IC value 
then explore the new inner region that will exist after 
the update. This can be done by a BFS that carries the 
token. The nodes report back to v\ the IC values of new 
inner nodes that could be reached. If this reported set 
equals X' , T is a feasible candidate for an update and 
phase "announce update" begins. Otherwise, the cycle 
search phase continues. 

Announce update: Now T contains a feasible 
augmenting cycle. v\ informs all involved nodes that 
an update is coming up. These nodes are T, N(T) and 
all nodes that can be reached from any new inner in 
the new region. This is done by a distributed BFS as 
in the "query feasibility" phase. Let I' be the set of all 
nodes that will become inner nodes after the update. 
During this step, the set J of independent nodes is also 
extended in a simple greedy fashion. 

If any node receives multiple update announce- 
ments, the initiator node of higher ID wins. The loser 
is then informed that its announcement failed. 

Update: When the announcement successfully 
reached all nodes without losing a tie-break somewhere, 
the update is performed. 

If there is just one component involved, i.e., \X'\ = 1, 
the update can take place immcdiatly. 

If \X'\ > 1, there might be problems keeping 
M(IC(-)) accurate if multiple augmentations happen si- 
multaneously. So v\ first decides that the new ID of the 
merged component will be v\ . It then determines what 
value M(IC(«i)) will take after the update. If this value 
strictly exceeds fitd(-ff), M(IC(«i)) is independent of 



potential other updates; the update can take place im- 
mediately. However, M(IC(vi)) < Rtd(K), concurrent 
updates have to be schedules. So v\ floods the involved 
components with an update announcement, and per- 
forms its update after all others of higher prioity, i.e., 
higher initiator ID. 

Finally, all nodes in T flood their -y-hop neigh- 
borhood so that cycle nodes whose cycle search phase 
was unsuccessful can start a new attempt, because their 
search space has changed. 

Lemma 3.5. If the augmenting cycle algorithm per- 
forms an update on a FGD, it produces another FGD 
with strictly more inner nodes. 

Proof. We need to show that all five FGD conditions are 
met: (Fl) and (F2) are checked in the backtrack phase, 
(F3) follows from l|3.6[) . (F4) from the connectivity test 
in the feasibility check phase, and (F5) follows from 
H3.7J) . The increase in inner nodes is assured in the 
query feasibility phase. 

Lemma 3.6. One iteration of the augmenting cycle al- 
gorithm for a given initiator nodes has message com- 
plexity <D(A%\V\) and time complexity 0(Af Ai + |V|). 

Proof. There at at most Aj£ cycles that are checked. 
For one cycle, the backtrack phase takes 0(Ai) message 
and time complexity. The query feasibility phase 
involves flooding the part of the new inside that is 
contained in the cycle. Because there can be any 
number of nodes in this region, message complexity 
for this flood is 0(\V\). The flood will be finished 
after at most 2&td(K) communication rounds, the time 
complexity is therefore 0(1). After a feasible cycle was 
found, the announce update and update phases happen 
once. Both involve a constant number of floods over 
the network, their message and time complexities are 
therefore 0(|V|). Combining these complexities results 
in the claimed values. 

4 Topological Clustering 

This section deals with constructing clusters that follow 
the geometric network topology. We use the working 
boundary detection from the previous section and add 
a method for clustering. 

4.1 Problem statement. We assume the boundary 
cycle nodes are numbered, i.e., C\, — (c-b,\, ■ ■ ■ ,Cb,\c b \) f° r 
b £ B. We use a measure d that describes the distance 
of nodes in the subgraph (C, E(C)): 

~ J +oo if b ^ b' 

<*lc M ,c 6V j : _ | min{|i ,_ jj; _ [f _ j|} i{b=b , 



That is, d assigns nodes on the same boundary their 
distance within this boundary, and oo to nodes on 
different boundaries. 

For each node v £ V, let Q v £ C be the set of cycle 
nodes that have minimal hop-distance to v, and let s v be 
this distance. These nodes are called anchors of v. Let 
v £ V and u,w £ N(v). We say u and w have distant 
anchors w.r.t. v, if there are q u £ Q u and q v £ Q v such 
that d(q u ,q w ) > ir(s v + 1) holds (with it = 3.14...). 
This generalizes closeness to multiple boundaries to the 
closeness to two separate pieces of the same boundary. 
(Here "separate pieces" means that there is sufficient 
distance along the boundary between the nodes to form 
a half-circle around v.) 

v is called k- Voronoi node, if N(v) contains at 
least k nodes with pairwise distant anchors. We use 
these nodes to identify nodes that are precisely in the 
middle between some boundaries. Let \% be the set 
of all k- Voronoi nodes. Our methods are based on the 
observation that V2 forms strips that run between two 
boundaries, and V3 contains nodes where these strips 
meet. 

The connected components of V3 are called intersec- 
tion cores. We build intersection clusters around them 
that extend to the boundary. The remaining strips are 
the base for street clusters connecting the intersections. 

4.2 Algorithms. We use the following algorithm for 
the clustering: 

1. Synchronize end of boundary detection. 

2. Label boundaries. 

3. Identify intersection cores. 

4. Cluster intersections and streets. 
Synchronize: The second phase needs to be 

started at all cycle nodes simultaneously, after the 
boundary detection terminates. For that matter, we use 
a synchronization tree in the network, i.e., a spanning 
tree. Every node in the tree keeps track of whether there 
are any active initiator nodes in their subtree. When the 
synchronization tree root detects that there are no more 
initiators, it informs the cycle nodes to start the second 
phase. Because the root knows the tree depth, it can 
ensure the second phase starts in sync. 

Label: Now the cycle nodes assign themselves con- 
secutive numbers. Within each cycle Cb, this starts at 
the initiator node of the last augmentation step. If Cb 
stems from a flower that has not been augmented, some 
node that has been chosen by the flower's seed takes 
this role. This start node becomes Cb,i- It then sends 
a message around the cycle so that each node knows 
its position. Afterwards, it sends another message with 
the total number of nodes in the cycle. In the end, each 
node Cbj knows 6, j, and \Cb\- Finally, the root of the 



synchronization tree gets informed about the comple- 
tion of this phase. 

Intersection cores: This phase identifies the in- 
tersection cores. It starts simultaneously at all cycle 
nodes. This is scheduled via the synchronization tree. 
This tree's root knows the tree depth. Therefore, it can 
define a start time for this phase and broadcast a mes- 
sage over the tree that reaches all nodes shortly before 
this time. Then the cycle nodes start a BFS so that ev- 
ery node v knows one q v £ Q v and s v . The BFS carries 
information about the anchors so that v also knows b 
and j for which q v = Cb.j. Also, each nodes stores this 
information for all of its neighbors. 

Each node v checks whether there are three nodes 
u\, U2, U3 € N(v) whose known anchors are distant, i.e., 
d(quj>Qu k ) > 7r(s„ + l) for j 7^ k. In that case, v declares 
itself to be a 3-Voronoi node. This constructs a set 
V 3 C V 3 . 

Finally, the nodes in V 3 determine their connected 
components and the maximal value of s v within each 
component by constructing a tree within each compo- 
nent, and assign each component an unique ID number. 

Cluster: Now each intersection core starts BFS up 
to the chosen depth. Each node receiving a BFS mes- 
sage associates with the closest intersection core. This 
constructs the intersection clusters. Afterwards, the re- 
maining nodes determine their connected components 
by constructing a tree within each component, thereby 
forming street clusters. 

Because the synchronization phase runs in parallel 
to the boundary detection algorithm, it makes sense to 
analyze the runtime behaviour of this phase separately: 



Theorem 4.1. The synchronization phase of the 
rithm has both message and time complexity 0{\V\^). 

Proof. We do not separate between time and message 
complexity, because here they are the same. Construct- 
ing the tree takes 0(|V| log |V|), and the final flood is 
linear. However, keeping track of the initiators is more 
complex: There can be augmentation steps. In 

each step, C(| V|) may change their status, which has to 
be broadcast over 0(|V|) nodes. 

Theorem 4.2. The remaining phases have message 
and time complexity 0{\V\\og\V\). 

Proof. The most expensive operation in any of the 
phases is a BFS over the whole network, which takes 
0{\V\\og\V\). 

5 Computational Experience 

We have implemented and tested our methods with 
our large-scale network simulator Shawn 11 . We 
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Figure 7: Two snapshots and final state of the Aug- 
menting Cycle algorithm. 



demonstrate the performance on a complex scenario, 
shown in FigureEl The network consists of 60,000 nodes 
that are scattered over a street map. To show that 
the procedures do not require a nice distribution, we 
included fuzzy boundaries and varying densities. Notice 
that this network is in fact very sparse: The average 
neighborhood size is approximatly 20 in the lightly 
populated and 30 in the heavily populated area. 

Figure [S] shows the FGD that is produced by the 
flower procedure. It includes about 70 flowers, where 
a single one would suffice to start the augmentation. 
Figure shows some snapshots of the augmenting 
cycle method and its final state. In the beginning, 
many extensions to single cycles lead to growing zones. 
In the end, they get merged together by multi-cycle 
augmentations. It is obvious that the final state indeed 
consists of a FGD that describes the real network 
boundaries well. 

Figure [H] shows the Voronoi sets V2 and V3. One can 
clearly see the strips running between the boundaries 
and the intersection cluster cores that are in the middle 
of intersections. Finally, Figure shows the clustering 
that is computed by our method. It consists of the 
intersection clusters around the 3-Voronois, and street 
clusters in the remaining parts. The geometric shape of 
the network area is reflected very closely, even though 
the network had no access to geometric information. 




Figure 8: Identified 2-Voronoi and 3-Voronoi nodes. 




Figure 9: The final clustering. 
6 Conclusions 

In this paper we have described an integrated frame- 
work for the deterministic self-organization of a large 
swarm of sensor nodes. Our approach makes very few 
assumptions and is guaranteed to produce correct re- 
sults; the price is dealing with relatively complex combi- 
natorial structures such as flowers. Obviously, stronger 
assumptions on the network properties, the boundary 
structure or the distribution of nodes allow faster and 
simpler boundary recognition; see our papers |S] and [5] 
for probabilistic ideas. 

Our framework can be seen as a first step towards 
robust routing, tracking and guiding algorithms. We 
are currently working on extending our framework in 
this direction. 
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